OpenAstrocytes

AI-ready dynamic activity from the astrocyte network

Maxine Levesque

Kira Poskanzer

 

Latest

October 31, 2025

tl;dr

Astrocytes make up about a third of the cells in your brain, and form a second, distinct brain network from the more-studied neuronal network. As neuroscience has discovered in only the last few years, the astrocyte network carries out extremely important computations to integrate information from across different brain networks, and to use this information to control physiological state and maintain long-term homeostasis. To help push the field’s understanding of astrocytes’ unique role forward, we’re releasing OpenAstrocytes—the largest repository of structured astrocyte calcium dynamics to date, drop-in ready for use in AI workflows.

To take a look at it yourself, run

pip install astrocytes

 


Background

The human brain is about one-third neurons. But two-thirds of it isn’t; and, in recent years, the neuroscience field has become more acutely aware of the incredible importance of the roles played by those other sorts of cells.

Figure 1: Schematic diagram of the tripartite synapse. Astrocytic processes (green) wrap around most cortical synaptic connections between neurons, allowing astrocytes to integrate the myriad chemical signals sensed in the neuronal chatter, and downstream, to incorporate that information with other signals about the local brain network sensed from other parts of the extracellular space.

One of those other thirds is made up of astrocytes—a crucially important subdivision of the brain’s little-understood glia (from “glue”, referencing old theories of their supposed passivity in brain function). Astrocytes form a cellular network woven between the neurons Figure 1, and as recent work has demonstrated (1), this network encodes rich information about local and global cheical physiological state.

All biological cells must incorporate a massive number of data-points, coming both from the outside world and also from the internal processes of the cell; indeed, this binding-together of disparate interactions into a shared, coherent, soliton-like packet is, perhaps, what makes life life. While all the diverse kinds of cells across the entire diverse tree of life have unique approaches to sensing and making sense of these signals, because of shared evolutionary history, a few common structural threads span vast swaths of this diversity. One of those central commonalities is the use of intracellular calcium signaling.

But astrocytes have a unique, and incredibly rich, way of signaling with calcium; the best way to appreciate it is to look at it by eye:

Figure 2: (Left) Dynamics of astrocyte calcium, as recorded with two-photon imaging of mouse cortical slices expressing the calcium-binding fluorescent indicator GCaMP. (**Right*) Segmentation of this activity into discrete events using the AQuA toolbox. Reproduced from (2).

Naturalistic astrocyte calcium activity is composed of discrete calcium events—high-dimensional, complex, activity that is dynamically coupled to both neuronal activity and activity within the rest of the astrocyte network. While recent work using more classical ML techniques has gotten us closer to cracking this “calcium code” in the astrocyte network, the rise of today’s AI tools raises the possibility of being able to find so many more features in this rich imaging that we, by eye, could never have even conceived of. What’s needed is a data repository designed from the ground up to take advantage of these tools.

 


Architecture

The best biological data science today is heavily leveraging the advances that have been made in AI over the last few years, and taking advantage of these tools to understand the latent structure of new biological datasets. We have previously used advances in machine learning to get at new features of our data that have led us to exciting new biology (1); but harnessing the full power of the methods that have risen to prominence over the last few years—and the new infrastructure and logistical challenges they present—has necessitated a ground-up reconceptualization of how we work with data at Forecast.

In particular, we wanted to take advantage of both:

  1. the ease and flexibility provided by streaming data for prototyping and deploying AI workflows; and,
  2. the richness of structured data that allows us to quickly make sense of the different metadata elements for samples or batches streamed in.

The former of these points is crucial for the nuts and bolts of scaling model training and inference, and has motivated advances in data tooling for AI like WebDatasets (3). The latter, we believe strongly, is a crucial point of departure for biomedical data; while immense progress has been made across domains by taking advantage of the incredible empirical power that lies in scaling, we believe architecting workflows at their foundation around keeping track of the richness of data’s explicit structure—and, most crucially, the structured relationships between those explicit data structures (4)—is a central part of how to leverage the power of these methods for biological applications, where the intrinsic squishy messiness makes purely tabula rasa inference of latent structure much more of an uphill battle.

Backbone: Distributed, typed WebDatasets with atdata

To realize this, we built the OpenAstrocytes dataset on atdata, a new open-source package for structured, distributed, and interoperable streaming datasets we’re developing at Forecast to aid in AI training and inference, both in our own work and for the larger investigative community.

The original motivation for atdata internally was to have fast infrastructure for putting together our rich, multimodal datasets that speeds up our iteration cycle for new AI R&D; we’re excited about the features we’re working on rolling out to the community soon!

atdata is built on the open WebDataset format, designed for use as a high-performance AI I/O layer used by many other research groups, and so is able to take advantage of their streaming, caching, smart batching, and drop-in PyTorch connections.

atdata adds functionality that makes it straightforward to understand structured data from remote repositories:

import astrocytes

# Dataset sample type schema
from astrocytes.schema import BathApplicationFrame

dataset = astrocytes.data.bath_application \
            .as_type( BathApplicationFrame )

# Pop off a single sample
exemplar = next( x for x in dataset.ordered( batch_size = None ) )

# Pop off and z-project a single batch
batch = next( x for x in dataset.ordered( batch_size = 60) )
batch_summary = np.quantile( batch.image[:, 0], 0.75, axis = 0 )

with HouseStyle() as s:
    fig, ax = plt.subplots( figsize = (8, 8) )
    plot_micrograph( s, batch_summary, ax,
        scale_x = exemplar.scale_x,
        scale_y = exemplar.scale_y,
        scale_bar = 100.,
        clim = (0, 140),
    )
Figure 3: Example data from the OpenAstrocytes dataset, showing mean calcium fluorescence in an ex vivo brain slice from a mouse over 60 frames (about one minute).

Here, the as_type method is hiding logic under the hood that takes advantage of pre-implemented bidirectional lenses that are opinionated about the way that generic microscopy images should be interpreted, if possible, as data coming from the specific experiments in the OpenAstrocytes data. These lenses have a rich compositional structure, and we are excited about the possibilities that follow downstream from starting at bedrock by keeping track of

The full documentation for atdata can be found on GitHub.

Datasets

Our initial release of OpenAstrocytes is built from two large collections of previously-published calcium imaging from mouse astrocytes, originally analyzed in (5) and (1). In total, these datasets comprise in total about 1M images, making this release among the the largest open datasets of astrocyte calcium dynamics to date, to our knowledge.

This initial release of the OpenAstrocytes corpus contains two-photon imaging of astrocyte intracellular calcium. These data were recorded during two different experimental setups:

  1. Bath application (astrocytes.data.bath_application): in these experiments, ex vivo visual cortical slices from mice were exposed to one of two different agents (baclofen or t-ACPD) after a baseline period, with the agents added to the solution bathing the full slice after the time of addition.
  2. Two-photon photo-uncaging (astrocytes.data.uncaging): in these experiments, ex vivo visual cortical slices from mice were exposed to one of two caged neurotransmitters (RuBi-GABA or RuBi-glutamate), with a pulse (or pulses) of a second laser applied after a baseline period to “free” the GABA or glutamate from the ruthenium bipyridine cage rendering it inert initially, but only within the small, ~10µm locus directly stimulated by the second laser (6).

If you want to play around with OpenAstrocytes, you can add it to your python project with

# with `pip`:
pip install astrocytes
# or, with `uv`:
uv add astrocytes

 


Demo

Let’s take a look at some of the fun structure that we have in the OpenAstrocytes data. Here, we’ll be digging into one of the experiments from (1), in which ex vivo slices from mouse visual cortex had two drugs—baclofen and t-ACPD1—bath-applied across the slice. Here, we’ll use the OpenAstrocytes dataset with contemporary tools to give views into the differences between these two compounds’ effects on brain networks.

1 Baclofen acts as an agonist of the GABAB receptor; t-ACPD acts as an agonist of the metabotropic glutamate receptor subtypes mGluR2 and mGluR3

At the outset, we’ll tuck away a few parameters of this experimental setup that will come up across analyses:

Code
t_intervene = 300
"The applied compound begins to enter the bath at 300s after recording onset"

Approach

In previous work (1), we identified the differences in astrocyte networks’ responses to baclofen and t-ACPD by using a spatial-temporal segmentation technique known as graphical time warping (7), as implemented in the AQuA toolbox (2). However, recent advances in large, self-supervised image models like DINOv3 (8) .

The overall approach presented in this demo is to run inference on the OpenAstrocytes bath application data using one of the vanilla DINOv3 models (dinov3-vit7b16-pretrain-lvd1689m), and leverage the fact that this model’s internal architecture separates out its image feature representations into patch embeddings. Each microscopy image is separated by the model into a 14-by-14 grid of image patches, with each patch corresponding to a ~60µm square portion of the imaged slice. At each time point over the course of an individual recorded movie, DINOv3 provides each one of these patches with a time series of 4096-dimensional embeddings, corresponding to its decomposition of the image into representative features as learned from its self-supervision over a vast swath of (principally non-biological, but with some exceptions) images.

Here, we use a simple linear basis change (via principal component analysis) to find a 64-dimensional subspace of this 4096-dimensional space already learned within the DINOv3 self-superevision objective that captures the subset of overall image features that appear in individual astrocyte calcium patches. This subspace captures both the image feature variability across different local astrocyte network morphologies as imaged in individual patches (as in Figure 8), as well as the image feature variability induced by the specific changes of the applied pharmacology in the bath application experiment. As it turns out, the richness in these feature representations—even in the absence of any biology-specific fine-tuning!—is enough to decode the identity of the applied compound from the calcium fluorescence pattern within a single ~60µm patch at a single time-point.

DINOv3 patch embeddings capture rich, dynamic responses within astrocyte networks

First, we’ll take a look at the latent structure of local patches of astrocyte network calcium, and how those features unfold after application of pharmacology.

Individual patch responses

First, we’ll download a representative movie from the OpenAstrocytes bath application dataset—the same recording we projected above in Figure 3 to take a look at the overall anatomy of the slice—and examine the behavior of a couple representative patches from it.

Code
# Snag a convenient smoother for image preprocessing
from scipy.ndimage import gaussian_filter

## Constants
N_PATCHES_Y = 14
N_PATCHES_X = 14

## Data hyperparameters
test_patches = [
    (10, 12),
    (3, 7),
]
"Indices (vertical, horizontal) of the demo patches to show"

first_movie_id = 'd22f6a65-b3e2-46c2-ba5c-551920af1fe3'
"Microscope-supplied UUID of the demo movie to show"

##

movie_frames = _get_movie( first_movie_id, verbose = False )

first_image = movie_frames[0].image[0]
patch_size_y = first_image.shape[0] / N_PATCHES_Y
patch_size_x = first_image.shape[1] / N_PATCHES_X

test_patch_frames = []
for cur_i, cur_j in test_patches:
    patch_idx_y = ( int( np.floor( patch_size_y * cur_i ) ),
                    int( np.ceil( patch_size_y * (cur_i + 1) ) ) )
    patch_idx_x = ( int( np.floor( patch_size_y * cur_j ) ),
                    int( np.ceil( patch_size_y * (cur_j + 1) ) ) )

    test_patch_frames.append(
        np.array( [ 
            frame.image[0][patch_idx_y[0]:patch_idx_y[1], :][:, patch_idx_x[0]:patch_idx_x[1]]
            for frame in movie_frames
        ] )
    )

test_patch_frames_filtered = [
    gaussian_filter( fs, sigma = (3., 0.6, 0.6) )
    for fs in test_patch_frames
]

## Figure parameters

patch_cmax = 110
"Maximum intensity value for patch snapshot figure"

i_frame_compare = (250, 350, 450, 550, 650, 750)
"Indices of time-slices to display for patch snapshot figure"

#

dt = movie_frames[3].t
n_patches = len( test_patch_frames_filtered )
n_frame_compare = len( i_frame_compare )

##

with HouseStyle() as s:

    fig, axs = plt.subplots( n_patches, n_frame_compare,
        figsize = (10, 4),
        sharey = True,
    )

    for i_patch in range( n_patches ):

        cur_patch_frames = test_patch_frames_filtered[i_patch]

        for i_t in range( n_frame_compare ):

            axs[i_patch, i_t].imshow(
                cur_patch_frames[i_frame_compare[i_t], :, :],
                #
                clim = (0, patch_cmax),
                cmap = 'afmhot',
            )

            if i_patch == 0:
                t_val = i_frame_compare[i_t] * dt - t_intervene
                axs[i_patch, i_t].set_title( f'{t_val:+0.0f}s', fontsize = 12 )

            if i_patch == (n_patches - 1) and i_t == 0:
                axs[i_patch, i_t].tick_params( axis = 'x', length = 3 )
                axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['0', f'{movie_frames[0].scale_x * patch_size_x:0.0f} µm'], ha = 'left' )
            else:
                axs[i_patch, i_t].tick_params( axis = 'x', length = 0 )
                axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
            
            axs[i_patch, i_t].set_yticks( [] )

            if i_t == 0:
                axs[i_patch, i_t].set_ylabel( f'Patch {i_patch+1}' )

    plt.subplots_adjust( wspace = 0.04, hspace = 0. )
Figure 4: Two example patches sub-sampled from the same recording as above in Figure 3, showing evolution of calcium activity over time relative to bath application of baclofen at \(t = 0\). (A small Gaussian smoothing kernel in space and time has been applied here, to aid visualization.)

These two patches are good illustrations of the “median” structure of this imaging data: both patches exhibit some changes after compound application (compare the right four columns with the left two); but, these changes can be subtle to the naked eye, as is particularly the case with Patch 2, shown in the bottom row of panels.

However, the image embedding model is able to pull out a good deal of structure in its latent feature embeddings—and, in particular, in dimensions of that embedding space maximally capturing the OpenAstrocytes corpus (i.e., our PCs). In particular, we see that there are axes within the patch-embedding space (PCs) that capture robust changes in the captured imaging induced by the applied compound, even if those changes were a bit more subtle to the naked eye:

Code
from astrocytes._datasets._embeddings import PatchEmbeddingTrace

## Figure params

baseline_window = (-90, -30)
plot_window = (-100, 240)
panel_verbose = False

#

wds_url = (
    'https://data.forecastbio.cloud'
    + '/testing/patch-pc-traces/bath-application/'
    + 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
)
ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )

it = ds.ordered( batch_size = None )
if panel_verbose:
    it = tqdm( it )

#

test_traces = [ None for _ in test_patches ]
for trace in it:
    try:
        assert trace.metadata is not None and 'uuid' in trace.metadata
        assert trace.metadata['uuid'] == first_movie_id
        
        for i_patch, (cur_i, cur_j) in enumerate( test_patches ):
            if trace.i_patch == cur_i and trace.j_patch == cur_j:
                test_traces[i_patch] = trace
        
        should_finish = True
        for x in test_traces:
            if x is None:
                should_finish = False
        if should_finish:
            break

    except:
        # print( trace.metadata['uuid'], trace.i_patch, trace.j_patch )
        continue

#

import matplotlib.pyplot as plt
from forecast_style import HouseStyle

def _baseline_normalize( ts, ys, window ):
    filter_cur = (
        (ts >= window[0])
        & (ts < window[1])
    )
    mean = np.mean( ys[filter_cur] )
    std = np.std( ys[filter_cur] )
    return (ys - mean) / std, mean, std

#

causal_offset = (84 * dt) / 4.
t_rel = test_traces[0].ts - t_intervene + causal_offset

with HouseStyle( grids = True ) as s:

    fig, axs = plt.subplots( n_patches, 1,
        figsize = (7, 7),
        sharex = True,
        # sharey = True,
    )

    for i_patch in range( n_patches ):
        ax = axs[i_patch]
        cur_trace = test_traces[i_patch]

        values_0_norm, mean_0, std_0 = \
            _baseline_normalize( t_rel, cur_trace.values[0, :], baseline_window )    
        values_1_norm, mean_1, std_1 = \
            _baseline_normalize( t_rel, cur_trace.values[1, :], baseline_window )
        values_2_norm, mean_2, std_2 = \
            _baseline_normalize( t_rel, cur_trace.values[4, :], baseline_window )

        filter_plot = (
            (t_rel >= plot_window[0])
            & (t_rel < plot_window[1])
        )

        ax.plot( t_rel[filter_plot], values_0_norm[filter_plot], 'C0-', label = 'PC 1' )
        ax.plot( t_rel[filter_plot], values_1_norm[filter_plot], 'C2-', label = 'PC 2' )
        ax.plot( t_rel[filter_plot], values_2_norm[filter_plot], 'C3-', label = 'PC 5' )
        
        yl = (-5.5, 10.5)

        ax.fill_between( [0, plot_window[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0 )

        ax.plot( plot_window, [0, 0], 'k-', linewidth = 1 )
        ax.plot( [plot_window[0], plot_window[0]], yl, 'k-', linewidth = 1 )

        ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
        # ax.set_yticks( [0, 5, 10] )
        ax.set_xlim( plot_window )
        ax.set_ylim( yl )

        ax.set_ylabel( f'Patch {i_patch + 1}' )

        if i_patch == (n_patches - 1):
            ax.set_xlabel( 'Time (s)' )
            # ax.set_ylabel( '∆ (baseline SD)' )

            plt.legend( fontsize = 12, loc = 'upper left' )

            ax.text( 5, 8.5, '+ Drug', alpha = 0.5 )
Figure 5: For each of the two astrocyte network patches shown above in Figure 4, the time-evolution of three representative embedding PCs relative to the onset of bath application of baclofen at \(t = 0\). Time shown as seconds relative to compound entering the bath (shaded box); changes shown as multiples of the baseline distribution standard deviation. (n.b.—A 60s boxcar filter in time has been applied to each PC, leading to blurring of activation edges.)

Response heterogeneity across the imaging field

The traces above depict only three of the 64 PCs we used as our bath application–specific subspace—and, only two of the 196 (14-by-14) patches yielded from inference on this experimental session.

These 64 PCs captured a large majority of the observed variance in the 4096-dimensional DINOv3 patch-embedding features; hence, while our astrocyte two-photon calcium dynamics data is quite rich, as expected the statistical structure of these specific sorts of biological images occupies a very small subspace of the full spectrum of naturalistic images.

When we look at the feature dynamics across all patches within the imaging field, the full richness of this method in characterizing becomes apparent:

Code
from astrocytes._datasets._embeddings import PatchEmbeddingTrace

#

def _get_movie_traces( movie_uuid: str, verbose: bool = False ) -> list[list[PatchEmbeddingTrace]]:

    # TODO Created `wids` indexed WebDataset for easier indexing in demos

    wds_url = (
        'https://data.forecastbio.cloud/testing/patch-pc-traces/bath-application/'
        + 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
    )
    ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )

    started_movie = False
    movie_traces: list[PatchEmbeddingTrace] = []

    if verbose:
        print( 'Iterating traces...' )
    for trace in ds.ordered( batch_size = None ):
        try:
            assert trace.metadata is not None
            assert 'uuid' in trace.metadata

            if trace.metadata['uuid'] == movie_uuid:
                if verbose and not started_movie:
                    print( 'Snagging correct movie...' )
                started_movie = True
                movie_traces.append( trace )

            else:
                if started_movie:
                    # We've moved to a new movie uuid, and so finished up
                    if verbose:
                        print( 'Got it!' )
                    break
        except:
            print( 'Skipping trace - no metadata' )
            continue
    
    # Arrange nicely
    ret = [
        [
            None
            for j in range( N_PATCHES_X )
        ]
        for i in range( N_PATCHES_Y )
    ]

    for trace in movie_traces:
        ret[trace.i_patch][trace.j_patch] = trace

    return ret


##

# Collate full dataset

first_movie_traces = _get_movie_traces( first_movie_id, verbose = False )

# Do plot

i_pc = 0

with HouseStyle() as s:

    fig, axs = plt.subplots( N_PATCHES_Y, N_PATCHES_X,
        figsize = (16, 12),
        # sharex = True,
        # sharey = True,
    )

    for i in range( N_PATCHES_Y ):
        for j in range( N_PATCHES_X ):
            ax = axs[i, j]
            trace = first_movie_traces[i][j]
            ts_rel = trace.ts - t_intervene + causal_offset

            cur_values_norm, cur_mean, cur_std = \
                _baseline_normalize( t_rel, trace.values[i_pc, :], baseline_window )  

            filter_plot = (
                (ts_rel >= plot_window[0])
                & (ts_rel < plot_window[1])
            )

            ax.plot( ts_rel[filter_plot], cur_values_norm[filter_plot], f'C0-', linewidth = 1.5 )
            
            xl = ax.get_xlim()
            ax.plot( xl, [0, 0], 'k-', linewidth = 1, zorder = -200 )
            # yl = ax.get_ylim()
            yl = (-10.5, 10.5)
            ax.fill_between( [0, xl[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0, zorder = 100 )

            if (i, j) in test_patches:
                ax.fill_between( [xl[0], xl[1]], yl[0], yl[1], facecolor = 'none', edgecolor = f'k', alpha = 0.7, linewidth = 2., zorder = -300 )
            
            ax.set_xlim( xl )
            ax.set_ylim( yl )

            ax.set_yticks( [] )
            ax.set_xticks( [] )
    
    for i in range( 1, N_PATCHES_X ):
        axs[-1, i].set_xticks( [-90, 0, 240], ['', '', ''] )
    axs[-1, 0].set_xticks( [-90, 0, 240], ['–90', '0', '240'], fontsize = 8, rotation = 90 )

    for i in range( 0, N_PATCHES_Y - 1 ):
        # axs[i, 0].set_yticks( [-10, 0, 10], ['', '', ''] )
        pass
    axs[-1, 0].set_yticks( [-10, 0, 10], ['-10', '0', '10'], fontsize = 8 )

    fig.subplots_adjust( wspace = 0.1, hspace = 0.1 )
Figure 6: Changes in DINOv3 embedding PC1 features after bath application of baclofen for each of the 196 astrocyte network patches derived from the slice depicted above in Figure 3; the locations of the two patches shown in greater detail in Figure 4 and Figure 5 are highlighted in black boxes. Note the heterogeneity of responses across the imaging field, as well as the correspondence between portions of the network exhibiting PC1 activation and anatomical features of the local astrocyte network in Figure 3. Time shown as seconds relative to compound entering the bath (shaded box); changes shown as multiples of the baseline distribution standard deviation.
Code
# Do plot

i_pc = 1

with HouseStyle() as s:

    fig, axs = plt.subplots( N_PATCHES_Y, N_PATCHES_X,
        figsize = (16, 12),
        # sharex = True,
        # sharey = True,
    )

    for i in range( N_PATCHES_Y ):
        for j in range( N_PATCHES_X ):
            ax = axs[i, j]
            trace = first_movie_traces[i][j]
            ts_rel = trace.ts - t_intervene + causal_offset

            cur_values_norm, cur_mean, cur_std = \
                _baseline_normalize( t_rel, trace.values[i_pc, :], baseline_window )  

            filter_plot = (
                (ts_rel >= plot_window[0])
                & (ts_rel < plot_window[1])
            )

            ax.plot( ts_rel[filter_plot], cur_values_norm[filter_plot], f'C2-', linewidth = 1.5, zorder = 200 )
            
            xl = ax.get_xlim()
            ax.plot( xl, [0, 0], 'k-', linewidth = 1, zorder = -200 )
            # yl = ax.get_ylim()
            yl = (-15.5, 15.5)
            ax.fill_between( [0, xl[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0, zorder = 100 )

            if (i, j) in test_patches:
                ax.fill_between( [xl[0], xl[1]], yl[0], yl[1], facecolor = 'none', edgecolor = f'k', alpha = 0.7, linewidth = 2., zorder = -300 )
            
            ax.set_xlim( xl )
            ax.set_ylim( yl )

            ax.set_yticks( [] )
            ax.set_xticks( [] )
    
    for i in range( 1, N_PATCHES_X ):
        axs[-1, i].set_xticks( [-90, 0, 240], ['', '', ''] )
    axs[-1, 0].set_xticks( [-90, 0, 240], ['–90', '0', '240'], fontsize = 8, rotation = 90 )

    for i in range( 0, N_PATCHES_Y - 1 ):
        # axs[i, 0].set_yticks( [-10, 0, 10], ['', '', ''] )
        pass
    axs[-1, 0].set_yticks( [-15, 0, 15], ['-15', '0', '15'], fontsize = 8 )

    fig.subplots_adjust( wspace = 0.1, hspace = 0.1 )
Figure 7: As in Figure 6, but for embedding PC2 features. Note that while the distribution of loci that exhibit modulation post-application still respects the underlying anatomy of Figure 3, the spatial and temporal distribution is distinct from the changes seen for PC1. Time shown as seconds relative to compound entering the bath (shaded box); changes shown as multiples of the baseline distribution standard deviation.

Components of static anatomy and dynamic physiology in patch embeddings

Because the original PCA used to detemrine the bath application–specific subspace of DINOv3 patch embeddings was performed across the full dataset, the image feature variance optimized in the method incorporated contributions from two distinct facets of the image structure: first, the contribution from variations across the static astrocyte network anatomical architectures imaged within individual patch; and second, the physiological dynamics that unfold within an individual patch.

In order to appreciate the structure of these two distinct contributions within the PC subspace projections in the bath application dataset, we can take a look at a handful of representative patches to visualize the way astrocyte network patches “dance” around PC-space:

Code
## Figure parameters
random_seed = 89
n_traces_show_raw = 100

##

rng = np.random.default_rng( random_seed )
traces_show_raw = rng.permutation( n_traces_show )[:n_traces_show_raw]

##

with HouseStyle() as s:
    fig = plt.figure( figsize = (10, 10) )

    ax = fig.add_subplot( projection = '3d' )
    ax.view_init( azim = -62 )

    #

    it = traces_show_raw
    if panel_verbose:
        it = tqdm( it )

    for i_trace in it:
        # trace start
        ax.scatter(
            trace_data_raw_show[i_trace, 0, 0],
            trace_data_raw_show[i_trace, 1, 0],
            trace_data_raw_show[i_trace, 2, 0],
            color = 'k',
            s = 3
        )
        # trace end
        ax.scatter(
            trace_data_raw_show[i_trace, 0, -1],
            trace_data_raw_show[i_trace, 1, -1],
            trace_data_raw_show[i_trace, 2, -1],
            facecolor = f'C{i_trace % 4}',
            edgecolor = 'k',
            linewidth = 0.5,
            s = 8
        )
        # trajectory
        ax.plot(
            trace_data_raw_show[i_trace, 0, :],
            trace_data_raw_show[i_trace, 1, :],
            trace_data_raw_show[i_trace, 2, :],
            f'C{i_trace % 4}-',
            linewidth = 0.5,
            alpha = 0.8,
        )

    ax.set_xticks( [0], '' )
    ax.set_yticks( [0], '' )
    ax.set_zticks( [0], '' )

    ax.set_xlabel( f'PC {pcs_show[0]+1}' )
    ax.set_ylabel( f'PC {pcs_show[1]+1}' )
    ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )

    ax.set_xlabel( f'PC {pcs_show[0]+1}', rotation = -14 )
    ax.set_ylabel( f'PC {pcs_show[1]+1}', rotation = 37 )
    ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )
    
    ax.set_box_aspect((1, 1, 1.1), zoom=0.85)
Figure 8: Three-dimensional slice of patch embedding PCs for a random selection of 100 astrocyte network patches as they unfold in time; colors are randomly selected for each individual patch’s dynamics. Black dots indicate the embedding of the network patch when the applied compound enters the bath at \(t = 0\); colored dots with black circles indicate the embedding at $t = $4 min after application, with the colored line between the two dots indicating the embedding-space trajectory of the patch across the recording. Among this ~1% subset of patches within the dataset, trajectories are fairly separable, indicating contributions from the overall differences in underlying astrocytic anatomy within the patch image; and, as expected, a number of the patches are relatively unresponsive to the applied compound, with only a small degree of motion in embedding-space across the recording (wiggly lines). However, a subset of the patches show strong, directed deviations after compound application, indicating reflections of physiological modulation in the patch’s image statistics. Grey lines indicate the PC-space origin.

As expected, the largest contributor to the heterogeneity of individual frames’ latent embeddings appears to come from the differences in anatomical structure—that is, from the regularities of the frames’ image content arising from the difference in the shape of the cells within the specific patch—as opposed to physiologic changes within a static anatomical architecture. We see this in the distinctive separation of the trajectories of individual patches within embedding-space: while the positions of individual patches do change over time across the recording, these dynamic changes tend to be small relative to the separation between full patch trajectories.

However, interestingly, those (relatively small) dynamic deflections do also appear to posess their own distinct structure; this is accentuated by plotting the patch trajectories normalized to each individual patch’s baseline period:

Code
with HouseStyle() as s:
    fig = plt.figure( figsize = (10, 10) )
    ax = fig.add_subplot( projection = '3d' )
    ax.view_init( elev = 20 )

    # for i_trace in tqdm( traces_show_raw ):
    for i_trace in traces_show_raw:
        # start
        ax.scatter(
            trace_data_show[i_trace, 0, 0],
            trace_data_show[i_trace, 1, 0],
            trace_data_show[i_trace, 2, 0],
            color = 'k',
            s = 3
        )
        # end
        ax.scatter(
            trace_data_show[i_trace, 0, -1],
            trace_data_show[i_trace, 1, -1],
            trace_data_show[i_trace, 2, -1],
            facecolor = f'C{i_trace % 4}',
            edgecolor = 'k',
            linewidth = 0.5,
            s = 8
        )
        # trajectory
        ax.plot(
            trace_data_show[i_trace, 0, :],
            trace_data_show[i_trace, 1, :],
            trace_data_show[i_trace, 2, :],
            f'C{i_trace % 4}-',
            linewidth = 0.5,
            alpha = 0.8,
        )
    
    ax.set_xlim( -32, 32 )
    ax.set_ylim( -32, 32 )
    ax.set_zlim( -12, 52 )

    ax.tick_params(axis='y', which='major', pad=-5)

    ax.set_xticks( [-30, 0, 30], ['', '', ''], fontsize = 12 )
    ax.set_yticks( [-30, 0, 30], ['–30', '', '+30'], rotation = -55, ha = 'left', fontsize = 12 )
    ax.set_zticks( [-10, 0, 50], ['–10', '', '+50'], fontsize = 12 )

    ax.set_xlabel( f'PC {pcs_show[0]+1}', rotation = -14 )
    ax.set_ylabel( f'PC {pcs_show[1]+1}', rotation = 37 )
    ax.set_zlabel( f'PC {pcs_show[2]+1}', rotation = 90 )
    
    ax.set_box_aspect( (1, 1, 1.1), zoom=0.85 )
    # fig.tight_layout()
Figure 9: Embeddings of the same patches from Figure 8, but normalized to the fluctuations of each patch within the baseline period before application of pharmacology. This view highlights that there are . Scale shown as multiples of the per-patch baseline standard deviation for each PC.

Put together, these results indicate that the latent structure of patch embeddings from DINOv3—a large, self-supervised image model without any specific biological emphasis in training—contains subspaces rich in information about both the anatomical structure of cells within the astrocyte network (as captured through two-photon imaging of our particular intracellular calcium indicator). This is a remarkable testament to both the intricate order of the compositional structure within image data writ large, as well as to the remarkable power that the self-supervised learning objective has in yielding models able to internalize many highly-nontrivial features of this intricate structure within the statistics of naturalistic images, even within highly-specific domains.

Image embedding features linearly separate physiological impacts of distinct compounds

The richness, in particular, of these embeddings’ portrait of physiologic changes within patches after the application of pharmacology—even if not the largest contributor to the layout of embeddings within a vanilla large image model like DINOv3—raises the question of whether a small subspace of imaging features contains information about the identity of the applied compound. This would extend our prior work demonstrating that specific more classical features of astrocyte calcium dynamics can encode this information at the more nonlocal level of full imaging fields (1) to a potential encoding within individual small network patches.

To investigate the extent of these local differences, we can look at one of the same representatives patch we did in Figure 4, but across multiple chemical exposures:

Code
test_patch_compare = test_patches[0]

movies_compare = [
    first_movie_id,
    comparison_movie_id,
]
frame_blocks_compare = [
    movie_frames,
    comparison_movie_frames,
]

test_patch_frames_compare = []
cur_i, cur_j = test_patch_compare

for cur_frames in frame_blocks_compare:

    patch_idx_y = ( int( np.floor( patch_size_y * cur_i ) ),
                    int( np.ceil( patch_size_y * (cur_i + 1) ) ) )
    patch_idx_x = ( int( np.floor( patch_size_y * cur_j ) ),
                    int( np.ceil( patch_size_y * (cur_j + 1) ) ) )

    test_patch_frames_compare.append(
        np.array( [ 
            frame.image[0][patch_idx_y[0]:patch_idx_y[1], :][:, patch_idx_x[0]:patch_idx_x[1]]
            for frame in cur_frames
        ] )
    )

test_patch_frames_compare_filtered = [
    gaussian_filter( fs, sigma = (3., 0.6, 0.6) )
    for fs in test_patch_frames_compare
]

#

dt = movie_frames[3].t
t_intervene = 300

cmax = 70
i_frame_compare = (250, 350, 450, 550, 650, 750)

n_series = len( test_patch_frames_compare )
n_frame_compare = len( i_frame_compare )

#

with HouseStyle() as s:

    fig, axs = plt.subplots( n_patches, n_frame_compare,
        figsize = (10, 4),
        sharey = True,
    )

    for i_series in range( n_series ):

        cur_patch_frames = test_patch_frames_compare_filtered[i_series]
        cur_compound = frame_blocks_compare[i_series][0].applied_compound

        for i_t in range( n_frame_compare ):
            ax = axs[i_series, i_t]

            ax.imshow(
                cur_patch_frames[i_frame_compare[i_t], :, :],
                #
                clim = (-cmax, cmax),
                cmap = COMPOUND_CMAPS[cur_compound],
            )

            if i_series == 0:
                t_val = i_frame_compare[i_t] * dt - t_intervene
                ax.set_title( f'{t_val:+0.0f}s', fontsize = 12 )

            if i_series == (n_patches - 1) and i_t == 0:
                ax.tick_params( axis = 'x', length = 3 )
                ax.set_xticks( [-0.5, patch_size_x+0.5], ['0', f'{movie_frames[0].scale_x * patch_size_x:0.0f} µm'], ha = 'left' )
            else:
                # axs[i_patch, i_t].set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
                ax.tick_params( axis = 'x', length = 0 )
                ax.set_xticks( [-0.5, patch_size_x+0.5], ['', ''] )
            
            ax.set_yticks( [] )

            if i_t == 0:
                ax.set_ylabel( f'{cur_compound}', color = COMPOUND_COLORS[cur_compound] )

    # s.label( xaxis_y = 0 )
    plt.subplots_adjust( wspace = 0.04, hspace = 0. )
Figure 10: Example frames of patch 1 from Figure 4 above, now showing evolution of calcium activity over time within the same patch relative to bath application of either baclofen (top row) or t-ACPD (bottom row). Note that, consistent with the findings from (1), t-ACPD evokes a much larger response in this astrocyte patch. (A small Gaussian smoothing kernel in space and time has been applied here, to aid visualization.)

We see by eye that t-ACPD evokes a considerably larger response in this patch, and this is reflected in the patch embeddings: while in Figure 5 above we were able to clearly distinguish the subtle modulations induced by the introduction of baclofen, when placed on the same scale (top row) as the changes induced in this same patch by t-ACPD (bottom row), in Figure 11 below, the difference is evident:

Code
from tqdm import tqdm

import atdata

import astrocytes
from astrocytes._datasets._embeddings import (
    PatchEmbeddingTrace,
)

#

t_intervene = 300
baseline_window = (-100, -15)
plot_window = (-100, 240)

#

wds_url = (
    'https://data.forecastbio.cloud/testing/patch-pc-traces/bath-application/'
    + 'bath_app-dinov3_vit7b16-pca64-smooth84.tar'
)
ds = atdata.Dataset[PatchEmbeddingTrace]( wds_url )

test_traces_compare = [ None for _ in movies_compare ]
# for trace in tqdm( ds.ordered( batch_size = None ) ):
for trace in ds.ordered( batch_size = None ):
    try:
        assert trace.metadata is not None and 'uuid' in trace.metadata
        assert trace.metadata['uuid'] in movies_compare
        
        i_movie = movies_compare.index( trace.metadata['uuid'] )

        if trace.i_patch == test_patch_compare[0] and trace.j_patch == test_patch_compare[1]:
            test_traces_compare[i_movie] = trace
            # print( 'Caught', trace.metadata['uuid'] )
        
        should_finish = True
        for x in test_traces_compare:
            if x is None:
                should_finish = False
        if should_finish:
            break

    except:
        # print( trace.metadata['uuid'], trace.i_patch, trace.j_patch )
        continue

#

causal_offset = (84 * dt) / 4.
t_rel = test_traces[0].ts - t_intervene + (causal_offset)

with HouseStyle( grids = True ) as s:

    fig, axs = plt.subplots( n_patches, 1,
        figsize = (7, 12),
        sharex = True,
        # sharey = True,
    )

    for i_series in range( n_patches ):
        ax = axs[i_series]
        cur_trace = test_traces_compare[i_series]
        cur_compound = frame_blocks_compare[i_series][0].applied_compound

        values_0_norm, mean_0, std_0 = \
            _baseline_normalize( t_rel, cur_trace.values[0, :], baseline_window )    
        values_1_norm, mean_1, std_1 = \
            _baseline_normalize( t_rel, cur_trace.values[1, :], baseline_window )
        values_2_norm, mean_2, std_2 = \
            _baseline_normalize( t_rel, cur_trace.values[4, :], baseline_window )

        filter_plot = (
            (t_rel >= plot_window[0])
            & (t_rel < plot_window[1])
        )

        ax.plot( t_rel[filter_plot], values_0_norm[filter_plot], 'C0-', label = 'PC 1' )
        ax.plot( t_rel[filter_plot], values_1_norm[filter_plot], 'C2-', label = 'PC 2' )
        ax.plot( t_rel[filter_plot], values_2_norm[filter_plot], 'C3-', label = 'PC 5' )
        
        yl = (-6, 85)

        ax.fill_between( [0, plot_window[1]], yl[0], yl[1], color = 'k', alpha = 0.05, linewidth = 0 )

        ax.plot( plot_window, [0, 0], 'k-', linewidth = 1 )
        ax.plot( [plot_window[0], plot_window[0]], yl, 'k-', linewidth = 1 )

        ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
        # ax.set_yticks( [0, 5, 10] )
        ax.set_xlim( plot_window )
        ax.set_ylim( yl )

        ax.set_ylabel( f'{cur_compound}', color = COMPOUND_COLORS[cur_compound] )

        if i_series == (n_patches - 1):
            ax.set_xlabel( 'Time (s)' )
            # ax.set_ylabel( '∆ (baseline SD)' )

            plt.legend( fontsize = 12, loc = 'upper left' )

            ax.text( 5, 8.5, '+ Drug', alpha = 0.5 )
Figure 11: For the astrocyte network patch depicted in Figure 10, the time-evolution of three representative embedding PCs relative to the onset of bath application of baclofen (top row) or t-ACPD (bottom row) at \(t = 0\). Time shown as seconds relative to compound entering the bath (shaded box); changes shown as multiples of the baseline distribution standard deviation. (n.b.:A 60s boxcar filter in time has been applied to each PC, leading to blurring of activation edges.)

The fact that these feature axes within patch embedding–space separate responses to the two compounds is not unique to this patch or this cortical slice: in fact, we can average across the responses in all patches in all recordings throughout the entire dataset, in order to view the unique profiles of dynamic responses to baclofen (red) and t-ACPD (blue):

Code
pcs_show_testing = (0, 1, 2, 3, 4, 5, 6, 7)

n_pcs_show = len( pcs_show_testing )

with HouseStyle( grids = True ) as s:

    fig, axs = plt.subplots( n_pcs_show // 2, 2,
        figsize = (11, 12),
        sharex = True,
    )

    for i_seq, i_pc in enumerate( pcs_show_testing ):
        i_ax = i_seq // 2
        j_ax = i_seq % 2
        ax = axs[i_ax, j_ax]

        ts_rel = trace.ts - t_intervene + causal_offset
        filter_baseline = (
            (ts_rel >= baseline_window[0])
            & (ts_rel < baseline_window[1])
        )

        raw_values = {
            c: np.array( [
                _baseline_normalize(
                    ts_rel,
                    trace.values[i_pc, :],
                    baseline_window,
                )[0]
                for trace in traces
            ] )
            for c, traces in compound_traces.items()
        }

        middle_trace = {
            c: np.mean( vs, axis = 0 )
            for c, vs in raw_values.items()
        }

        # print( { c: vs.shape for c, vs in raw_values.items() } )

        middle_boots = {
            c: boot_stat( vs, lambda x: np.mean( x, axis = 0 ), axis = 0,
                n = 200,
                verbose = False,
            )
            for c, vs in raw_values.items()
        }
        # print( { c: vs.shape for c, vs in middle_boots.items() } )
        middle_low = {
            c: np.quantile( vs, 0.025, axis = 0 )
            for c, vs in middle_boots.items()
        }
        middle_high = {
            c: np.quantile( vs, 0.975, axis = 0 )
            for c, vs in middle_boots.items()
        }

        filter_plot = (
            (ts_rel >= plot_window[0])
            & (ts_rel < plot_window[1])
        )

        #

        for c in middle_trace.keys():
            ax.plot( ts_rel[filter_plot], middle_trace[c][filter_plot], '-',
                color = COMPOUND_COLORS[c]
            )
            ax.fill_between(
                ts_rel[filter_plot],
                middle_low[c][filter_plot],
                middle_high[c][filter_plot],
                color = COMPOUND_COLORS[c],
                alpha = 0.2,
                linewidth = 0.5,
            )

        xl = plot_window
        yl = ax.get_ylim()
        # yl = (-6, 6)

        ax.plot( xl, [0, 0], 'k-', linewidth = 1 )

        ax.fill_between( [0, xl[1]], yl[0], yl[1],
            color = 'k',
            alpha = 0.05,
            linewidth = 0,
        )

        ax.set_xlim( xl )
        ax.set_ylim( yl )

        ax.set_ylabel( f'PC {i_pc + 1}' )
        ax.set_xticks( [-90, 0, 60, 120, 180, 240] )
    
    fig.subplots_adjust( wspace = 0.2, hspace = 0.1 )
Figure 12: Average change in first seven principal components of patch embeddings relative to application of baclofen (red) or t-ACPD (blue). Changes normalized to the mean and standard deviation of each patch in the baseline period. (Mean ± trace-wise bootstrapped 95% confidence intervals.)

Amazingly, we see that these feature axes capture consistent differences in the evoked imaging features between the two compounds, across a large array of different individual mice, slices, and experiments.

The question then arises: are these consistent differences robust enough to decode the compound applied to a particular patch of astrocyte network? We can test this by using a simple generalized linear readout (via logistic regression) of our PC-space to classify the applied compound. To understand in more detail the dynamics of this potential information content about the applied compound, we’ll restrict ourselves to looking at decoding from the activity in a single patch at a single time-point after compound application, and then vary across all time-points within the experiment.

Looking at this decoding performance across time reveals some interesting patterns:

Code
with HouseStyle( grids = True ) as s:
    fig, ax = plt.subplots( figsize = (8, 5) )

    #

    for i_seq, (n_pc_cur, acc_cv_cur) in enumerate( zip( n_pc_try, acc_cv ) ):
        middle_trace = np.mean( acc_cv_cur.T, axis = 0 )
        middle_boots = boot_stat( acc_cv_cur.T, lambda x: np.mean( x, axis = 0 ), axis = 0,
            n = 1_000,
            verbose = False
        )
        middle_low = np.quantile( middle_boots, 0.005, axis = 0 )
        middle_high = np.quantile( middle_boots, 0.995, axis = 0 )

        ax.plot( ts_rel[idx_iter], middle_trace, label = f'{n_pc_cur} PCs', zorder = -i_seq * 10 )
        ax.fill_between( ts_rel[idx_iter], middle_low, middle_high,
            alpha = 0.2,
            zorder = -1 - i_seq * 10,
        )

    #

    middle_trace = np.mean( acc_cv_shuffle.T, axis = 0 )

    middle_boots = boot_stat( acc_cv_shuffle.T, lambda x: np.mean( x, axis = 0 ), axis = 0,
        n = 1_000,
        verbose = False
    )
    middle_low = np.quantile( middle_boots, 0.005, axis = 0 )
    middle_high = np.quantile( middle_boots, 0.995, axis = 0 )

    ax.plot( ts_rel[idx_iter], middle_trace, color = 'k', label = 'Permuted', zorder = -100 )
    ax.fill_between( ts_rel[idx_iter], middle_low, middle_high,
        color = 'k',
        alpha = 0.2,
        zorder = -101
    )

    xl = window_decode
    # yl = ax.get_ylim()
    yl = (0.36, 1.04)

    ax.plot( xl, [0, 0], 'k-', linewidth = 1 )

    ax.fill_between( [0, xl[1]], yl[0], yl[1],
        color = 'k',
        alpha = 0.05,
        linewidth = 0,
        zorder = -200
    )


    ax.text( 5, 0.75, '+ Drug', alpha = 0.5, va = 'center' )

    ax.set_xlabel( 'Time (s)' )
    ax.set_ylabel( f'Decoding accuracy' )

    plt.legend( fontsize = 12 )

    s.label( xaxis_y = 0.36, data_xlim=(-60, 240))
    ax.set_xticks( [-60, 0, 60, 120, 180, 240] )
    ax.set_xlim( xl )
    ax.set_ylim( yl )
Figure 13: Accuracy of classifying applied pharmacology from embeddings of a single astrocyte network patch, as a function of time after application of the compound. Colored bands indicate bootstrapped confidence interval of accuracy across cross-validation replicates. Each trace color corresponds to a distinct number of included PCs in the decoder. The black trace indicates a null condition in which the compound labels were permuted at each time point, but decoding was performed otherwise in the same manner.

Not only are we able to decode quite well the identity of the applied compound from a single ~60µm patch of the astrocyte network, but it is clear that the particular richness of these imaging features is necessary in order to tease the compound differences apart. This indicates that it is not a simple matter of intensity at play here to distinguish systematically between the two compounds: rather, a sparse but rich subspace of the overall DINOv3 image feature patch embedding space contains the information content needed to identify . This may indicate that these image features contain a nontrivial reflection, via the observed calcium activity, of underlying differences in physiological processes recruited by these two distinct pathways (i.e., GABAB vs mGluR3 receptors).

Discussion

This brief foray into analyzing the OpenAstrocytes data demonstrates the remarkable richness present in astrocyte calcium dynamics, as well as the amazing potential that lies in making this data readily available in a format that slots directly into today’s AI-driven biological data science workflows.

We’re particularly interested in highlighting three key patterns we’ve seen here:

  1. Generalist AI features contain specialized information: While DINOv3 was trained on a very broad image corpus, we were able to see both anatomical features of local astrocyte networks, as well as physiological features of the dynamics of intracellular astrocyte calcium, reflected in small subspaces of these embeddings. This has large implications for the application of these generalist large models to biological data science.
  2. Heterogeneity of responses across individual fields: It is amazing to see consistency in the responses of particular subspaces to particular drugs; however, we’re very excited to dig into what underlies the differences between responses in specific small pieces of the same slice of brain tissue. We think there’s a lot more about astrocyte biology hidden in the splitting of these differences.
  3. Decoding performance from rich features: We saw great chemical decoding performance from a single frame of one ~60µm patch of tissue—and this is a pessimistic estimate, given how many patches were unresponsive due to differences in expression of the calcium indicator! There’s a lot to tease apart in the richness of the heterogeneity we saw, particularly in the time-evolution. While we didn’t need to include all of the DINOv3 embedding features to get good decoding, we needed a few; we’re excited about this pattern of results, as it suggests that the decoding performance isn’t arising from some trivial feature (like increased intensity), but instead from a richer combination of features that could generalize to more interesting predictive tasks.

 


Looking ahead

The OpenAstrocytes corpus is one step in our larger thrust into providing open tools to understand the dynamics of astrocyte physiology, as well as our approach within Forecast to use this incredible science to understand how our chemical tools change brain networks.

Building on atdata

atdata’s typing functionality already helps for keeping datasets sane; but, where keeping track of sample types really shines is when we register mappings between types. As an example, take this from the astrocytes library accompanying this pub:

@atdata.lens
def from_generic( s: ts.Frame ) -> 'BathApplicationFrame':
    
    assert s.metadata is not None
    
    return BathApplicationFrame(
        # TODO
        applied_compound = 'unknown',
        image = s.image,
        t_index = s.metadata['frame']['t_index'],
        t = s.metadata['frame']['t'],
        #
        date_acquired = s.metadata['date_acquired'],
        movie_uuid = s.metadata['uuid'],
    )

Under the hood, atdata.lens keeps track of the full registry lenses between sample schema types. Because lenses can have a very nice compositional behavior (9), this design enables us, as we’re building AI models from a bunch of different sources, to easily and automatically aggregate across our diverse hive of WebDatasets, built across different modalities and experiments, taking advantage of the natural ways these data can be compatibly merged.

The full atdata ecosystem is architected to go one step further, placing metadata for type schemas of scientific datasets—and the code for these lenses between them!—in a public, decentralized repository on ATProto, the protocol layer underlying the Bluesky social network. This allows AI coding agents on data science tasks to automatically leverage domain knowledge of structured semantic relationships to decide what data to pull from for a given task.

More on atdata soon, including our ATProto appview for the full distributed dataset network with MCP support—sign up for updates to hear first as we push more features to the public release!

Loving neurons, too

Of course, the brain also is one-third neurons—a fact we certainly wouldn’t want to forget. This is why we’re building out the OpenAstrocytes repository with newly-collated data from the Allen Brain Observatory (10), also formatted with atdata for easy use with existing AI workflows.

from astrocytes.data import allen
from astrocytes.data.allen.schema import BrainObservatoryFrame

dataset = (
    allen
        .generic.braion_observatory
        .dataset
        .as_type( BrainObservatoryFrame )
)
# Pull an example sample
sample = next( x for x in dataset.shuffled( batch_size = None ) )

with HouseStyle() as s:
    fig, ax = plt.figure( figsize = (8, 8) ))
    plt.imshow( sample.image )

We’re continuing to collate this fantastic dataset’s movies of raw two-photon calcium imaging from neurons in visual cortex, as it provides another phenomenal source of neurobiological imaging data for augmenting fine-tuning workflows; since neuronal and astrocytic imaging share tremendous overlap in latent image statistics, we anticipate that combining large datasets of similar dynamic imaging from the two cell types will provides immense benefit to the study of each.

A new kind of data

At Forecast, we’re building out a new experimental dataset taken not from , but instead from 3D in vitro systems built from human induced pluripotent stem cells. Human astrocyte dynamics are a near-completely unexplored frontier within neuroscience; we are extremely excited about the potential lying within this system, as we build out OpenAstrocytes in the next few months with more and more data taken from this completely blank canvas. Stay tuned ;).

 


 

Acknowledgements

We’d like to thank Dr. Michelle Cahill and Dr. Michael Reitman for driving the original experimental work and data collection for the images that went into the initial release of OpenAstrocytes presented here.

Support for the production of OpenAstrocytes at Forecast was generously provided by the Special Initiatives division of the Astera Institute.

Citation

Please reference this work in BibTeX as:

@article{levesque2025openastrocytes,
  author = {Maxine Levesque and Kira Poskanzer},
  title = {OpenAstrocytes},
  journal = {Forecast Research},
  year = {2025},
  note = {https://forecast.bio/research/open-astrocytes/},
}

AI use

No language model tools were used to produce any of the code for or writing in this pub.

References

1.
M. K. Cahill, M. Collard, V. Tse, M. E. Reitman, R. Etchenique, C. Kirst, K. E. Poskanzer, Network-level encoding of local neurotransmitters in cortical astrocytes. Nature 629, 146–153 (2024).
2.
Y. Wang, N. V. DelRosso, T. V. Vaidyanathan, M. K. Cahill, M. E. Reitman, S. Pittolo, X. Mi, G. Yu, K. E. Poskanzer, Accurate quantification of astrocyte and neurotransmitter fluorescence dynamics for single-cell and population-level physiology. Nature neuroscience 22, 1936–1944 (2019).
3.
A. Aizman, G. Maltby, T. Breuel, High performance i/o for large scale deep learning (2020). https://arxiv.org/abs/2001.01858.
4.
nLab authors, Vertical categorification (2025).
5.
M. E. Reitman, V. Tse, X. Mi, D. D. Willoughby, A. Peinado, A. Aivazidis, B.-E. Myagmar, P. C. Simpson, O. A. Bayraktar, G. Yu, others, Norepinephrine links astrocytic activity to regulation of cortical state. Nature Neuroscience 26, 579–593 (2023).
6.
E. Fino, R. Araya, D. S. Peterka, M. Salierno, R. Etchenique, R. Yuste, RuBi-glutamate: Two-photon and visible-light photoactivation of neurons and dendritic spines. Frontiers in neural circuits 3, 556 (2009).
7.
Y. Wang, D. J. Miller, K. Poskanzer, Y. Wang, L. Tian, G. Yu, “Graphical time warping for joint alignment of multiple curves” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett, Eds. (Curran Associates, Inc., 2016; https://proceedings.neurips.cc/paper_files/paper/2016/file/f0bbac6fa079f1e00b2c14c1d3c6ccf0-Paper.pdf)vol. 29.
8.
O. Siméoni, H. V. Vo, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V. Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. Jégou, P. Labatut, P. Bojanowski, DINOv3 (2025). https://arxiv.org/abs/2508.10104.
9.
nLab authors, Lens (in computer science) (2025).
10.
S. E. de Vries, J. H. Siegle, C. Koch, Sharing neurophysiology data from the allen brain observatory. Elife 12, e85550 (2023).